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^ ■ Abstract 

- ^ , We introduce the "Sequential Empirical Bayes Method" , an adaptive constrained-curve fitting 

XI 

H ' procedure for extracting reliable priors. These are then used in standard augmented-x^ fits on 

separate data. This better stabilizes fits to lattice QCD overlap-fermion data at very low quark 
mass where a priori values are not otherwise known. Lessons learned (including caveats limiting the 
scope of the method) from studying artificial data are presented. As an illustration, from local-local 
two-point correlation functions, we obtain masses and spectral weights for ground and first-excited 
states of the pion, give preliminary fits for the oq where ghost states (a quenched artifact) must be 
dealt with, and elaborate on the details of fits of the Roper resonance and S'ii(A^^/^^) previously 
presented elsewhere. The data are from overlap fermions on a quenched 16'^ x 28 lattice with spatial 
size La = 3.2 fm and pion mass as low as ~ 180 MeV. 
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I. INTRODUCTION 



The recent advocacy of the use of Bayesian statistics for the analysis of data from lattice 
simulationsL_in the guise of the methods of constrained curve fittiner 111 or maximum 
entropy SQQ. has eased eons.de.ab.y the an*ig.ty and imitation assoeiated with est,- 

mating the systematic errors due to curve fitting, especially when extracting masses, spectral 
weights and matrix elements from Monte Carlo estimates of correlation functions. 

Previously, Monte Carlo estimates, {G{t)), of two-point hadronic correlators had been fit 
to a theoretical model, such as 

oo 

G{t;w,,m,) = Y.^^e'""'' (1) 

i=l 

where Wi is the spectral weight of the i*^ state, by the maximum-likelihood procedure of 
minimizing the 

X'iw,, m,) = HGit)) - Git; w,, m,)) ((G(t')) - w,, m,)) (2) 



t,t' 



with covariance matrix 



a 



t,t' 



Traditionally, these had been fit only at large Euclidean times t > t^in, where contributions 
from excited states are exponentially damped. The art had been to choose a value of tmin 
which compromises between unnecessarily high statistical errors for large tmin and high 
systematic errors (from contamination from excited states) for small t^[^. Lattice alchemy 
provided various recipes for making the compromise and estimating the systematic errors, 
but the procedures were often suspect and always frustrating. 

The truncation of the data set to only a few large t was deemed necessary because the 
alternative (of including more time slices but also more terms in the fit model) resulted 
in unacceptably unstable fits to the sum of decaying exponentials (traditionally a bane 
of numerical analysts). Success was achieved in some cases by enlarging the data set by 
including more channels, e.g. diagonalization of multi-source multi-exponential fits. Indeed 
when correlators from very many sources could be calculated cheaply, such as for glueballs 
or static quarks, the improvement was dramatic. But most often, when only a couple of 
channels at best could be fit simultaneously, the competition between increased statistical 



errors for large tmin and large systematic errors for small tmin remained; although the final 
statistical and systematic errors were reduced, the effort and uncertainty in obtaining a 
reliable systematic error remained. 

Constrained curve fitting offers the alternative of minimizing an augmented x , 

Xaug X ~l~ Xprior (4) 
Xprior ^ y '-^ (^) 

where pi denotes the collective parameters of the fit (e.g. pi = {wi,mi} for a sum of ex- 
ponentials), as a way of achieving stability by "guiding the fit" with the use of Bayesian 
priors, that is, values of the parameters obtained from a priori estimates pi = pi ±5"j. With 
improved stability, the data sets can be enlarged to include small t and the theory can 
be enlarged by including many more terms in the fit model until convergence is obtained. 
The systematic error associated with the choice of t^^^ is thereby largely absorbed into the 
statistical error. 

The advantage that the constrained curve fitting of lattice data has over a typical data 
set that a numerical generalist would consider, is that often we have reliable estimates of, 
or at least constraints on, the fit parameters from outside the data (for example, the masses 
must be positive, or the level spacing is expected to be such-and-such from reliable models) 
which can then be used as Bayesian priors. Examples, such as upsilon spectroscopy l| 
where the level spacing can be reliably estimated from quark models and experiments, are 
impressive. Remarkably, constrained curve fitting with Bayesian priors on such data has 
been able to give satisfactory fits for local-local correlation functions, i.e. when multi-source 
fits are unavailable (presumably due to prohibitive cost). 

But with our recent data p|, we enter previously unexplored territory. We work with 
overlap fermions with exact chiral symmetry at unprecedented small quark mass and large 
spatial volume. The literature, from which to obtain estimates to be used as priors, is limited. 
Furthermore, the details of the level spacings (e.g. the Roper resonance and the A(1405)) 
are hotly debated between advocates of quark models versus those of chiral models. The use 
of priors in standard constrained curve fitting tends to "lock in" the fit (within a sigma or 
so); if one gets them badly wrong, then the fitted results may be misleading. Furthermore, 
the stability of the fit results against choice of prior must be tested - this reintroduces an 
element of subjectivity. As a modification of the basic Bayesian-prior constrained-curve 



fitting (augmented x^) procedure, we propose to make it more automatic, and to further 
absorb systematic errors associated witli clioice of prior into statistical errors. 

In section II, we give an overview of the "Sequential Empirical Bayes Method" detailing 
our extension of constrained curve-fitting. In Section III, we add some further improvements 
to better assess and reduce systematic errors, and study fits to artificial data where the true 
values of the parameters are known. In Section IV we give, as an illustration of the efficacy 
of the algorithm, some results from our low quark mass overlap fermion data for the excited 
states of the pion, present preliminary fits for the oq where ghost states (a quenched artifact) 
must be dealt with, and comment on the details of fits of the Roper resonance and SuiN^^^^) 

n 

previously presented elsewhere p. Our summary and conclusions follow in Section V. 



II. THE SEQUENTIAL EMPIRICAL BAYES METHOD 

Bayesian statistics is an entire field in itself, with an old and broad history; for an intro- 
duction, see 0,0- Empirical Bayes methods are intermediate between classical ("frequen- 
tist") and Bayesian methods. The core ideas of a sequential analysis of the data originate 
with Robbins . We propose the Sequential Empirical Bayes (SEB) method as a 

refinement especially well suited for the special properties of lattice Monte Carlo correlation 
functions. 

We begin in subsection III Al by giving a brief description of the standard constrained- 
curve fitting approach emphasizing its basis in Bayesian probability theory. Our sketch 




121; indeed, our notation is an 



relies heavily on a few recent and relevant summaries 
amalgam of theirs. We follow with a descriptive overview of our method in subsection III Bl 
and a more detailed rendering of the basic algorithm in subsection III CI 



A. Synopsis of Standard Constrained-Curve Fitting 

To account for the more general case of multi-source fits, as well as to allow the time 
dependence to be non-exponential (periodic boundary conditions result in cosh or sinh, and 
quenched artifacts can give rise to more complicated temporal dependence), we write 

X\p) = EiMaip) - D^)a-j{Mp{p) - Dp) (6) 

0/3 
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where p are the collective parameters of the fit (e.g. pi = {wi, rrii} for a sum of exponentials), 
the indices a, (3 distinguish different values of the independent variable (time for correlation 
functions) and different interpolating fields, D are the Monte Carlo data, Ma{p) is the fit 
model, and o"^^ is the covariance matrix. 

Minimizing the iii Eq. [Ul is the solution of the problem of determining the set of fit 
parameters p which maximizes P{D\p), the conditional probability of measuring the data 
D given a set of parameters, also known as the "likelihood" of the data. Bayesian inference 
turns this question around and demands that the solution of the curve-fitting problem consist 
of determining the set of parameters p which maximizes P{p\D), the conditional probability 
that p is correct given the measured data D. That is, Bayesian inference asks which fit-model 
parameters are most likely given the data. 

The computation of the latter conditional probability is possible because of the celebrated 
Bayes' theorem 

Pipin) ^ ™^ (7) 

PiD\p)P{p) 



JdpPiD\p)Pip) 

which follows directly from the elementary properties of probability theory 

P{p\D)P{D) = P{pnD) (9) 
= P{D\p)P{p) (10) 

The unconditional probability P{p) is the plausibility one assigns to the parameters p be- 
fore the additional information of the fit is provided, and is known as the "Bayesian prior 
distribution". The conditional probability P{p\D) is known as the "posterior probability dis- 
tribution" ; it is the reassessment of the likelihood of the parameters after the fit incorporates 
the data. The unconditional probability P{D), known as the "prior predictive probability" 
of the data, is independent of p and thus serves only as a normalization constant; it is 
determined from Eq. [7|and the normalization condition / dpP{p\D) = 1. 

Heuristically, Bayes' theorem can be thought of as "posterior probability" oc "likeli- 
hood" X "prior probability". Traditional inference ( "frequentist theory") uses the likeli- 
hood P{D\p) to quantify all statistical measures of inference (mean, standard deviation, x^, 
confidence limits, ■■■), while Bayesian inference uses the posterior distribution P{p\D) to 
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determine the statistics. From Eq. [71 Bayesian theory requires augmenting the information 
that the frequentist theory uses with the additional estimate of the prior distribution P{p). 
Of course, the two methods agree when the prior distribution is chosen to be constant. 

The hkehhood P{D\p), needed by both frequentists and Bayesians, can be estimated 
if the Monte Carlo data set is sufficiently large. Then, regardless of the statistics of the 
underlying population distribution (the ensemble of all possible configurations), the sample 
distribution (Monte-Carlo-generated averages over finite sets of configurations) will have 
Gaussian statistics, as assured by the Central Limit Theorem. Thus 

P{D\p) cx exp(-xV2) (11) 

with the given by Eq. [HI 

The Bayesian prior distribution P{p) is the probability that a particular set of parameters 
p is correct, a priori to the data analysis. It is implicit in that the Bayesian probabilities are 
conditional on some background information. Operationally, P{p) serves as a constraint on 
the parameters p. For the physicist, these might be constraints on the ranges of parametric 
values that are physically feasible (e.g "positive energy splittings"), or perhaps something 
stronger as dictated by experience from previous similar fits or guidance from models of 
QCD or experiments. 

A simple choice for the prior distribution is Gaussian 

Pip) oc exp{-xl,,j2) (12) 

with Xprior defined in Eq. |S| Then P{p\D) oc P{D\p)P{p) of Eq. [7|is maximized by minimiz- 
ing the augmented ixiug = + Xprior) -^l- S This is the approach outlined in 

m 

where they choose values for the priors (p) = p and (p^) — (p)^ = 5"^ based on physicists' 
intuition. 

The opposite extreme is the use of the "entropic prior" , based on the view that for some 
fits where the number of parameters is larger than the number of measured data (such as for 
the spectral density function j^, 0, 0| , or in a wider context image restoration) only minimal 
information about the priors is available. 

We seek an approach between the two extremes. We use the language of augmented 
but seek to use a subset of the available data to estimate the priors in an orderly fashion, 
which will then be used on new data. 
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B. Overview of the SEB Method 



So how do we obtain our priors? For concreteness, consider again the fit model of the 
sum of decaying exponentials in Eq. [T] Apportion the data into a nested set (picture the 
shells in an onion or a Kewpie doll) and extract estimates for priors in a progressive manner, 
at each stage obtaining estimates of one or two new priors upon each expansion of the data 
set. An especially natural and well-suited nesting is to partition the hadronic correlator data 
via Euclidean time slices. That is, the first and smallest data set includes all configurations 
but with times slices t restricted to t > tgtart (and perhaps t < t^ax depending on boundary 
conditions). Subsequently, the n*^ data set includes all time slices such that t > tstart — 
{n — 1) At, where n = 1, 2, ■ ■ ■ and At is most simply chosen to be the constant 2. (A more 
general choice is made in practice, as described later.) The time tstart is chosen by the same 
criteria used in traditional fits so that an unconstrained fit is well approximated by a single 
exponential. The output values for two parameters, the ground state mass and spectral 
weight, are then used as priors for the next fit on the augmented data set. For the second 
fit, two (or more, in general) additional time slices are included, as are two more parameters 
in the fit model, the weight and mass of the first excited state. The new fit is constrained 
with regard to the ground state but unconstrained with regard to the first excited state. 
The four fitted parameters are then used as priors in a third fit on a larger data set, and so 
on until all desired time slices and/or terms in the fit are included. 

In this way, we estimate the priors from a subset of the data. Furthermore, we choose that 
subset of the data which is best suited for making the estimation. The choice is determined 
with the same compromise as is used for traditional unconstrained fits, namely between 
minimizing statistical errors (which grow as t increases) and minimizing the contamination 
of higher excited states (which grows as t decreases). 

Thus we propose an adaptive self-contained constrained curve-fitting procedure, dubbed 
the "Sequential Empirical Bayes" method. In a nutshell, we obtain the priors gradually 
(allowing them to change as needed), from the ground state up, as the data set is mono- 
tonically enlarged by including earlier and earlier time slices. Its advantages include that 
it is usable whenever external reliable estimates of the priors are not available, and it is as 
automatic as one could hope for, thereby reducing the potential to introduce bias and of 
course decreasing the frustrating busy-work of fitting. Especially for the low-lying states, if 
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the initial priors are estimated incorrectly, there are several subsequent steps by which they 
may change (by about a sigma each time). 

C. The Basic Algorithm 

From experience we have discovered that for some data, either "real" (actual lattice 
gauge theory data) or artificially constructed, a very basic algorithm which incorporates the 
SEB philosophy is adequate for producing reliable priors. However, some of our data have 
exposed deficiencies in the basic algorithm. These have been corrected without violating the 
spirit of the SEB, but make the resulting final algorithm rather mysterious and complicated 
to describe in one pass. Accordingly, in this paper, we begin here by outlining the simplest 
algorithm as a template. In Sect. Illll we will introduce modifications as necessary. 

Table E] describes for simplicity the very basic "fixed At" algorithm, in which after each 
pair of steps. At = 2 new (earlier) time slices are added to the data to be fitted. The 
algorithm is as follows: 

• Choose tmax and tmin, the maximum window over which the fits will be done. The 
window is to be chosen as large as possible. For correlation functions expected, on 
theoretical grounds, to be positive then if a value of a correlation function at a time 
slice is not within one sigma of being positive, then that time slice and all greater time 
slices are eliminated from consideration. This prevents noisy correlation functions at 
large t from giving grossly inaccurate estimates of the fitted values to be subsequently 
used as priors. 

• Determine the number of the terms we want to use in the fit model, and determine 
^start as the starting point for the fit. Ensure that (tstart — ^min) > # terms x At 

• Choose central trial values Wi and mi (or Ei) equal to those obtained from the effective 
mass relations. Loop on various trial values around these central values. For each, use 
an unconstrained fit on the one-mass-term model to fit the correlator data including 
time slices tstart to tmax and obtain w^^ ± a'^^ and m^i^ ± a^j . Choose as input for the 
next step those values which yield the lowest (but reasonable) x^/dof. 

• Using these values of Wi ± o"^^ and mi ± a„i-^ as both priors and initial values, do 
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TABLE I: Tabled The very basic "fixed At" algorithm, in which after each pair of steps, At = 2 
new (earher) time sHces are added to the data to be fitted. For simphcity, the fit model of Eq. ^is 
assumed in this example. Refer to the text for the meaning of the superscripts and subscripts on 
the masses m and spectral weights w. 
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a constrained curve fit (using the one-mass-term model on the data set enlarged to 
include tstart — 1) to obtain ± a^l and mf'' ± a^j. 

• Loop on a wide range of trial values for W2 and 1712- With a two-mass-term model, 
constrain the first mass and weight (using the previous output as both priors and 
initial values) but leave the second mass and weight unconstrained. Loop on various 
trial values for the latter. Do this half-constrained fit on the data set enlarged to 



9 



include tstart — 2 and obtain ^2 ± cr^j "^2 -'- '^ml- Choose as input for the next 
step those values which yield the lowest (but reasonable) x^/dof. 

• Using these values of Wi ± a^^, rrii ± am-^, W2 ± <Jw2y ^2 ± o'm2 as both priors and 
initial values, do a fully-constrained fit (using the two-mass-term model on the data 
set enlarged to include tgtart — 3) to obtain w[^^ ± a^^, m^i^ ± cr^|, W2^^ ± a^^, and 

• Repeat the last two steps until all desired mass terms and time slices are included. 
One thus obtains a complete set of priors. 

• Add the final time slice tmin and do a fully-constrained fit using previously obtained 
values for priors and initial guesses. 

All fits are correlated using the full covariance matrix. Furthermore, the entire process 
is bootstrapped (or jackknifed). Final quoted errors are bootstrap (or jackknife) errors. 
Within a bootstrap sample, at intermediate steps in the algorithm, the sigmas of the priors, 
CTp, are obtained from the fitting errors of the previous step. 

As a variation, rather than deciding a priori on the number of terms in the fit and adding 
time slices one at time, one can let the data decide how many time slices to include with 
each enlargement of the data by choosing the minimum x^/dof over a range of reasonable 
possibihties. Thus, for example, if the data is dominated by the ground state for many time 
slices, then many time slices will be automatically added before an attempt is made to fit 
the first-excited state. We will refer to this improvement as the "variable At" approach, and 
to the basic template described in this section as the "fixed At" approach. 



III. ASSESSING SYSTEMATIC ERRORS 

The minimization can be cast more generally as minimizing a functional A{p) > of 
the vector p (see, for example, In the extreme case, with more unknown parameters 

than data points, the minimization procedure becomes degenerate and there is enough free- 
dom to drive down to unreasonably small values. With more data points than unknown 
parameters, although the agreement with the data typically is good (often too good to be- 
lieve), the solution is unstable and often wildly oscillating. This signals a proximity to the 
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degeneracy of the underlying minimization problem. Constrained curve fitting can be cast 
more generally as the problem of minimizing A{p) + \B{p) 

^ {A{p) + XB{p)) = (13) 

If A{p) is degenerate, but B{p) is not, the degeneracy is lifted. The stability is improved in 
general. The solution p(A) varies along a "trade-off curve" (see, for example, A (e.g. 

X^) measures the agreement of the model to the data, while B (e.g. Xprior) plays the role of 
a stabilizing functional. The standard constrained-curve fitting procedure selects A = 1. 

To estimate the systematic errors associated with the choice of priors, it is common to 
repeat the procedure with all (Xp replaced by rjdp where = 2 is a typical choice. This moves 
one along the trade-off curve from A = 1 to A = 1/rf., giving less weight to the priors. More 
generally, one could have an independent weighting factor, 77^, multiplying the di of each 
prior. 



A. Bells and Whistles 

1. "Global Dynamical Weight" 

For concreteness, we return to the simple case of a sum of decaying exponentials of 
Eqs. d 12 

Xprior ~ X] „2^2 {^^) 

By choosing rjj > 1, one relaxes the constraint of the priors and thus tests the sensitivity 
of the fitted values to the choice of prior. We present three ways, in order of increasing 
sophistication, to implement the choice of each rji. 

(a) Global Static Weight: The canonical approach is to keep it unchanged as a global 
constant for the fit, rjf = 1/A. The variation of the dependence of the output fitted 
results on this global input parameter may be used to estimate the systematic error. 

(b) Local Variable Weight: Allow each rji to be a local variable. At each step, loop on 
various values of r]i and choose the value which minimizes the x^/dof. Monitor against 
runaway solutions. 
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FIG. 1: Plot of ground-state pion mass, niTra (left) and spectral weight w (right), versus global 
static weight, A, for bare quark mass ma = 0.226. 
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FIG. 2: Same as in Fig.^but for the nucleon for bare quark mass 



ma 



0.188 (m^ = 583(3) MeV). 



(c) Local Dynamical Weight: For some quantities, such as for the ground-state mass and 
spectral weight of the pion displayed in Fig. [H the dependence of the fitted results on 
the value of A is quite mild. For others, such as for the nucleon of Fig. |2l the depen- 
dence is stronger. Accordingly, we propose an adaptive procedure of absorbing any 
systematic error associated with the choice of rj into the statistical error, by upgrading 
each rji to a dynamical variable to be determined by the fit. It is incorporated as a 
dynamical fit variable by further augmenting the x^'- 
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The choice of fji and 5"^ is also somewhat arbitrary, and results in some uncertainty in 
assessing the systematic error; however, the dependence on these parameters is gentler 
than on the corresponding static global values and any remaining systematic error is 
masked by statistical error. 

(d) Global Dynamical Weight: In our fits presented here, we made the simplest choice of 
a special case of (c): we took all rji for different parameters to be the same global 77, 
and took fj = 1 and = 1. 

Often, the dependence of the results of the fit on the Global Static Weight is sufficiently 
smooth that the use of Global Dynamical Weight is unnecessary. 

2. "Releasing the Constraint" 

A way to mitigate the potentially aggressive nature of the fully-constrained fit in the 
Sequential Empirical Bayes Method is the following: When one is interested in estimating 
a particular parameter (i.e. the mass or weight of the ground state or n}^ excited state) one 
takes the priors from the previous method and does a partially constrained fit: all other 
parameters are constrained except the one in question which is unconstrained (or perhaps 
only very fightly constrained). We say that the parameter to be estimated is "released" 
(from the constraint). The statistical error estimate obtained this way is larger and a more 
conservative choice. We will, by default, release the constraint for all of the final results 
presented. 

3. "Scanning" 

Referring back to our algorithm, if a prior is available, we use it as the value of the initial 
guess. But as a new parameter is introduced there is no prior, and an initial guess must be 
obtained by some other criterion. In practice, it may happen that different initial guesses 
lead to different values for the fit parameters, especially for the unconstrained fit. This may 
signal the presence of multiple local minima in the complicated topography. To address 
this we introduce and advocate the use of "scanning" over a grid of reasonable initial values 
and selecting that choice which leads to a descent to the lowest x^. By extending the scan to 
a rather large region of parameter space, we ensure that the initial guesses for the parameters 
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will not confine the minimization search to a single basin of local attraction, but rather 
will extend over many, including that of the global minimum. 



B. Testing the Algorithm 

1. "Reconstructing Artificial Data 
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FIG. 3: Recovery of parameters from artificially constructed data. 



As there are a lot of ingredients to our modification of the standard constrained fit method 
("Onion-Shell Data", "Scanning", "Global Dynamical Weight" , "Releasing the Constraint") 
it is comforting to learn that the method can successfully reconstruct the parameters of 
artificially-constructed data where the true results are known independently of the fit. We 
created a sample of artificial data as a sum of five decaying exponentials, with means for 
masses and weights fixed at values close to those extracted from real data. Then, for 
simplicity, given this function, G{t), we added an independent Gaussian noise at each value 
of t. When run through our fitting procedure, we were able to reconstruct the masses 
and weights for the ground state, first-excited state, and second-excited state (see Fig. Oj); 
for these, the actual values were within one measured standard deviation of the measured 
masses. 



14 



2 
1.5 

a 

O 1 
0.5 

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 

ma 

FIG. 4: Test of partitioning the data. Ground and first excited state pion mass, m^a, as a function 
of the bare quark mass ma. 

2. "Partitioning the Configurations" 

We have constructed an automated and natural way of obtaining the priors from a subset 
of the data. However, one worries that there is a significant amount of "data snooping" which 
may make it difficult to estimate the systematic errors. (Strictly speaking, using any of the 
data to obtain the priors violates the Bayesian approach.) To alleviate these worries, we 
have implemented the following test: we partition the data into two non- intersecting sets 
of configurations, A and -B, with an equal number, = = 40 of configurations in each 
set (which must still be large enough to permit stable covariant fits). Using the set A of 
configurations, we perform our procedure outlined above of obtaining the priors gradually, 
from the ground state up, as the data set is monotonically enlarged by including earlier and 
earlier time slices. Now, regarding this entire procedure as a black box solely for the purposes 
of obtaining the priors, we next use this fixed set of priors in the canonical way |i] to perform 
a constrained fit separately on data set B (for which there is no data snooping), on data set 
A (maximal data snooping), and on the full set A\JB (partial data snooping but with greater 
statistics). Any disagreement beyond statistical errors can help assess systematic errors due 
to data snooping. In our case we found no appreciable differences beyond expected statistical 
fluctuations. Figure |3] shows a plot of the ground and flrst excited state pion mass, m^ra, as 
a function of the bare quark mass ma. 



ground state (80 conf) 
ground state (40 conf) 
excited state (80 conf) 
excited state (40 conf) 
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3. "Stability 



However priors are obtained, they can then be used in a standard constrained curve fitting 
procedure, wherein the fit window tmin^^max is held fixed (and as big as feasible), while the 
number of terms in the fit model is increased one-by-one until the fit results converge for the 
lowest few parameters of interest. To test whether the Sequential Empirical Bayes Method 
has produced reliable priors, we subsequently use its priors in the standard way as 
described above. Figure El illustrates the passing of this test. 



Number of terms 

FIG. 5: In a test of stability, priors had previously been selected by the Sequential Empirical 
Bayes Method. Now they are used in the standard constrained fit. The figure shows fit values 
for the lowest two masses from constrained fits (for all t's) with different numbers of terms in the 
fit model. The data is from the pion two-point correlation function (^4^4) at bare quark mass 
ma = 0.188 (m^ = 583(3) MeV). 



4-. "Fitting Errors" 

Given the posterior probability distribution, P{p\D), all statistics may be evaluated by 
computing integrals. In practice, such evaluation is often difficult to perform directly; Monte 
Carlo methods such as simulated annealing Q can be used. Even so, the computational cost 
is often daunting and so approximations are sought. Most commonly, one may assume that 
the data set is sufficiently large that the Central Limit Theorem implies that xing is approx- 
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imately quadratic about its minimum. We in fact use this approximation, and furthermore 
use the resulting fitting errors from a fit as the priors a for the next fit. To test this, we plot 
in Fig. inithe Xaug a function of p in the neighborhood of the minimum. Superimposed is 
the parabola which is the quadratic approximation about the minimum. The two agree very 
well. Also calculated and plotted as ranges are the "fit error" , obtained from the quadratic 
approximation, and the second moment of the actual distribution. These too agree very 
well, and indicate that using the naive fit error as provided by the minimization routine is 
quite adequate. 

Although fit errors are used to produce priors for the inner loops of the algorithm, all 
final errors reported are bootstrap errors. 




0.637 0.639 0.641 0.643 0.645 
a 



FIG. 6: Plot of the augmented Xaug ™ vicinity of its minimum. It agrees well with a quadratic 
approximation, ensuring that the fit error provided by the minimization routine is adequate. The 
data is from the pion two-point correlation function at bare quark mass ma = 0.226 (m^r = 
633(10) MeV). 



5. Caveat Emptor: Further Toy Model Results 

It is crucial to be extremely conservative when making fits, especially when relying on 
Bayes' inference methods. One must be sure to avoid announcing a "false positive" , that is 
making claim which cannot be guaranteed to be correct. We can understand in more detail 
how a false positive may emerge and how to design the procedure to avoid them, by testing 
variants of the SEB algorithm against a number of "toy models" where the actual values of 
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the function's parameters are known. 

Suppose the artificial data is created with the three-state model function 



G^™^(f) = w^e'"^'' + W2e-'^^' + w^e-"^'' (18) 

\ Wi Wi J 

and we want to perform a fit of the data sample generated by this function with fixed values 
of input parameters and Gaussian statistical error 5G{t) at each time shce t. Then SEB 
should work if the following holds: 

1. There exists ati such that in the time range ti < t < imax, 

-e-("^^-"^^)*<5G'(t)-)-, (19) 

so that in this range, the data can be fitted by wie~"^^^. That is, there is a "plateau" 
in the effective mass plot for large time. 

2. There exists a ^2 such that in the time range t2 < t < ti, 

W3_^-ims-rm)t ^ SG(t)^ < ^e-^"^'-"^'^' (20) 
Wi G{t) Wi 

so that in this range, the data can be well fitted by wie~"*^* + ^26""*^*. 

3. In the time range t < t2, the third state becomes important, and the full three-state 
model must be used in the fit. 

Now the SEB is very adaptive; in the course of the iterative procedure, the ground state 
is fitted several times and allowed to fioat within a sigma or so of its current prior with each 
new fit. Thus the original criterion of the existence of a clear plateau need not be strictly 
enforced. But if the condition for the plateau is badly violated, or if the similar conditions for 
the excited states are badly violated, there is a possible danger of the algorithm gravitating 
toward a local minimum in the which is not the true value. 

To illustrate, consider artificial data, constructed from the following three-state toy model 

3 

G{t;wi,mi) = (21) 

i=l 
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in the time range < t < 15, with relative errors (uncorrelated and Gaussian) increasing 
with time t to mimic actual LGT data. 

In keeping with the spirit of the SEB, we fit first to a single exponential over the time 
interval t G [ti, tmax], then use the fitted ground state values as priors for a two-state partially- 
constrained fit in the time interval t G [t2)^max], and then use the fitted ground and first- 
excited state fit values as priors for a final three-state partially-constrained fit in the time 
interval t G [tmin, ^max], where tmin = < t2 < < ^max = 15. But to stress the caveat about 
the dangers of "false positives", we explore all values for the pair (^1,^2)- 

a. "Lowest Precision" case: With statistical errors ranging from 6G/G ~ 0.011 at 
t = 0, to ~ 0.12 at t = 15, the lowest xV^of fit is for (^1,^2) = (10, 7) with mi = 0.855(7), 
wi = 521(34), ma = 3.10(62), W2 = 0.0004(1), mg = 1.51(5), w-^ = 769(35), that is, the fit 
fails to produce the input parameters of the toy model. 

b. "Low Precision" case: If the Gaussian noise is reduced by about a factor of two to 
5G{t)/G{t) ~ 0.006 at t = 0, 6G{t)/G{t) ~ 0.05 at t = 15, then there are two solutions with 
the lowest x^/dof = 0.527: a false positive with (^1,^2) = (10,2) and mi = 0.854(4), wi = 
518(23), ma = 1.50(8), W2 = 701(63), mg = 1.86(92), W3 = 69(62), and a fit which agrees 
with the input model: (ti,t2) = (10,4) with mi = 0.853(5), Wi = 511(29), ma = 1.39(16), 
W2 = 369(188), ms = 1.65(17), W3 = 408(188). 

c. "High Precision" case: If the Gaussian noise is further reduced to 5G{t)/G{t) ~ 
0.001 at t = 0, 6G{t)/G{t) ~ 0.01 at t = 15 then the lowest xV^of = 0.530 does reproduce 
the input parameters: (^1,^2) = (10,5), mi = 0.851(1), Wi = 504(7), m2 = 1.38(6), W2 = 
451(141), ms = 1.77(10), W3 = 343(144). 

Let AG{t) be the absolute value of the difference of the function G^'^^^{t) of input param- 
eters and the function G^^^'^'^{t) of fitted parameters 

AG{t) = |G'*™'=(t) - G'^"'='^(t)|. (22) 

A fit G^^^^'^it) with reasonable small x^/dof implies that the relation 

AG{t) < 6G{t) (23) 

roughly holds at most times t. In other words, with this statistical error 6G{t), we cannot 
distinguish the fitted function G^^^^'^{t) from the original G^^^'^{t). 

From Fig. [7| we see that the statistical errors of the data set ("Lowest Precision") 
are all larger than the difference of the two functions in the full time range. This is 
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FIG. 7: The relative errors of three data sets of increasing precision ("Lowest", "Low", 
and "High") are plotted with points. The curved solid (black) curve is the plot of the 
function AG(t)/G*''"°(t), which is the relative difference between original function G*''"°(i) = 
500exp(-0.85t) +400exp(-1.35t) +400exp(-1.75t) and the false-positive fitted function G^'^^'"' = 
52ie-0-855i _^ 769e-i-5" from the data set ("Lowest Precision"). 

the reason why the false positive can be chosen by the fit routine. The dashed (blue) 
straight line (^2/""^! exp(— (m2 — mi)t) = 0.8 exp(— 0.5t)) and the dotted (red) straight line 
(ws/wi exp(— (m3 — mi)t) = 0.8 exp(— 0.9/!:)) intersect with the curve of relative errors at ti 
and t2, respectively. This shows that the ground state dominates in the range t > ti, the 
first-excited state plays a role in the range t2 < t < ti, and the three states should all be 
included in the time range t < t2- When the precision is higher, the statistical errors are 
smaller than the difference of two functions, in some time range, so that the fit can reject 
the false positive G^^^^'^^t). 

Returning to the false positive of the "Lowest Precision" case, notice that when forced to 
fit to a three-state model, the fit prefers a two-state solution with the one weight essentially 
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zero, and the remaining weight consistent with the sum of the true second and third weights 
and the remaining mass interpolating between the true masses. In short, the fit is content 
with averaging the two higher states into one. Also notice that it is the second weight which 
is zero. This is because, at the intermediate stage, a two-state model was forcing a fit to 
data completely dominated by a single state. Subsequently, as earlier times were added, the 
data could be described by two states, but since the would-be second state weight was by 
now constrained near zero, its role was supplanted by the third state in the model. 

That is, by introducing the new (second) state into the model before the data was precise 
enough to discern it, spurious results were obtained. This suggests the cure: postpone the 
introduction of new states in the model until the data demands it (as evidenced by a sudden 
increase in the x^/dof). 

In more detail, if the time range is large enough there exists a largest time ti beyond 
which the contamination of higher states will be obscured by the statistical errors and can 
be neglected so that in the time range t > ti the ground state dominates and the data 
can be well described by the single exponential ifi exp(— mit). Thus the time range t > ti 
is a good choice for the first step of SEB, and we can get a reasonably small x^/dof over 
this ground-state effective- mass plateau. If we include more time slices with t2 < t < ti, 
the contribution of the first excited state becomes significant, but the second excited state 
contributes little to the correlation function in this range. The presence of the first excited 
state results in a noticeable increase of the x^/dof if we continue to force a single exponential 
as a fit model; the inclusion of one more term in the fit model is necessary. Continue this 
procedure until the time-slices are exhausted. 

The Penultimate Algorithm: "Not-too-Soon" 

1. The available data are in the time range t^m ~ tmax- 

2. From effective mass plots, choose an initial time range t G [tstart, ^max] to do an uncon- 
strained one-mass fit. Use "scanning" - see Sect. IIII A31 

3. Include one more time slice and repeat an (independent) unconstrained one-mass fit, 
and monitor the fitted parameters mi,Wi and the x^/dof. 

4. If the fitted parameters and the x^/dof do not change much, include one more time 
slice and repeat the previous step. This iteration stops if there is a noticeable change 
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of x^/ dof and the values of the fitted parameters indicating a breakdown of the one- 
state model. Then set h — 1 equal to the time at which the x^/dof jumps, indicating 
the necessity of a two-mass fit for t < ti. Set the priors for the ground state mass and 
weight equal to the fitted values from the last low-x^/dof fit over t e [ti,tmax]- 

5. Include one more mass term in the fit model and do a partially-constrained two- mass 
fit (with scanning) for t G [ti — 1, tmax]- The ground state priors are fixed at the values 
determined by the previous step. The first-excited state is unconstrained but scanned. 

6. Repeat, adding time slices until the two-state model breaks down as indicated by a 
jump in the x^/dof. Then set t2 — I equal to the time at which the x^/dof jumps, 
indicating the necessity of a three- mass fit for t < t2- Set the priors for the ground 
state and first-excited state mass and weight equal to the fitted values from the last 
low-x^/dof fit over t e [i2,^max]- Note that the ground-state priors are refreshed. 

7. Repeat, adding more time slices and more fit-model terms (one at a time and only 
when necessary) and more until all time slices are used. 

8. The highest state in the fit model will be absorbing all the contributions from higher 
states in the true function, and thus its fitted parameters will differ from the true 
values. Thus the highest state in the fit model must be rejected. 

Now wc return to the artificial data from the toy model "Lowest Precision" . Recall that 
when we assume a three-mass fit and independently try all pairs (^1,^2), we obtained a false 
positive. But the statistical errors are so large that they obscure the contributions from 
excited states at all but the earliest time shces. Adding terms to the fit model prematurely 
led to spurious results. Now we apply the new method outlined immediately above to the 
same data. 

A one-state fit works well for t G [ti,tmax = 15] until the x^ suddenly jumps at the 
(proposed value of) ti = 3. Thus we set ti = 4. The values of the one- mass term fit over 
time range [4, 15] are used to set priors for m\ and wi, and a second term is added to the 
fit model. The two-term model works well for a fit in the range [i2, 15] with t2 varying from 
3 to 0. The two- mass model fit the data very well (x^/dof = 0.265) in the whole time range 
with fitted parameters mi = 0.855(7), Wi — 521(34), 777.2 = 1.51(5), W2 — 768(35). Since the 
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two-mass fit exhausts all the time slices and does not show a noticeable increase of x^/dof, 
we don't go ahead to include the third mass term in the fit model. 

But this fit is a "false positive" ! Since the statistical errors are larger than the difference 
between the false positive and the true solution fEq.E^. the false positive cannot be rejected 
on the basis of its x^/dof, by this or any other method. So what have we gained? The 
difference this time is that this solution is exposed as a two-mass-term fit rather than a 
spurious three-term fit. And as always, the highest state in the fit must routinely be dropped 
since in general it can be contaminated by contributions from higher excited states. 

We conclude that with the "Lowest Precision" data set, we cannot get an unambiguous 
estimate of the second and the third state. The fit is comfortable predicting the ground 
state parameters only (and they are correct). Thus the algorithm has not made a "false 
positive" claim. 

Now, we illustrate this technique with another set of artificial data for which it will be 
possible to extract excited states. It will be instructive to monitor at the behavior of the 
X^/dof, but also to see how the fit parameters of each term in the fit model stabilize as time 
slices are added. Fig. |H1 (left) plots the x^/dof for data artificially constructed as a sum of 
four decaying exponentials. A one-term fit is adequate for time slice 13 (tmax = 15). But 
it is exposed as being entirely inadequate as time slice 12 is added to the data set to be 
fit. (The x^/dof jumps tremendously because we have made the statistical errors on the 
data to be fit quite small, to illustrate the point.) So for time slices 6-12 a two-term fit 
model is used, and the fit is quite adequate. At time slice 5, however, the x^/dof jumps 
tremendously, exposing the two-term fit as inadequate. Thus a third term is added at time 
slice 5. The three-term fit works down through time slice 2. At time slice 1, a fourth term 
must be added to keep the x^/dof from being unreasonably large. 

Meanwhile, Fig. |H1 (right) monitors the masses of the lowest few states as more time slices 
are added to the data set and more terms are added (as necessary) to the fit model. At time 
slice 13, the one-term model is fitted with a ground-state mass (open circle). (Its value is 
consistent with the value obtained much later when time slices 1-12 have been added to the 
fit model and three more terms are added to the fit model. That is, time slices above 13 are 
sufficient to determine the ground-state mass, i.e the effective mass plot has "plateaued". 
But more importantly for the efficacy of the algorithm, this fitted value remains stable later 
as the data and fit model are expanded.) At time slice 12, the second term is added to 
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Time Time 

FIG. 8: Behavior of the x^/dof of the fit as earher time slices are added to the fit range. A sudden 
jump indicates that the fit model is inadequate and that a new mass term should be added to 
the model. Accordingly, then the x^/dof drops only to increase again as further time slices are 
added. "Not-too-Soon" addition of terms to the fit model prevents a single state being erroneously 
identified as two. Also shown are the behavior of the ground and first few excited states, which 
stabilize as more terms are added. 

the fit model (solid diamonds). As each earlier time is added, the first-excited state mass 
changes by a sigma or so, as the augmented data is refit. This is important. It is saying that 
if a relatively small amount of data from time slices 12 and above suggested a somewhat 
inaccurate guess for the prior for this mass, then subsequent fits including data at time slices 
8-11 can influence the value and bring it toward the correct value (at the horizontal line). 
By time slice 8, the first excited state mass has stabilized (at the correct value) and will 
not subsequently deviate through time slice 6. At time slice 5, the first-excited state mass 
deviates from the correct answer. But this is where the jump in the x^/dof warns that the 
two-state fit model is inadequate; the first-excited state fitted mass is contaminated from 
higher states in the data. Adding a third term returns the fitted first-excited state mass to 
its correct value. It remains stably at this value with the addition of earlier time slices 2-4 
to the three-state fit model (open triangles). Before the second-excited state has stabilized, 
we have run out of earlier time slices to add, and we can make no claim as to whether its 
fitted value is correct. (It isn't.) But the algorithm has successfully fitted the first-excited 
state (and of course the ground state). 

Figs, ini and ^1 tell a similar story for artificial data which is less precise (1% and 5% 
respectively). The x^/dof jump (indicating that another term should be added to the fit 
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FIG. 9: Same as in Fig. |Hlbut the artificial data is less precise (1%). Accordingly, the x^/dof 
does not jump to such extreme values when the number of the terms in the fit model becomes 
inadequate. 

model) but, as one would expect, not as extremely as for the more precise data of Figs. |H1 
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FIG. 10: Same as in Fig. Elbut the artificial data is even less precise (5%). 



6. Our Final Algorithm: "Just-in- Time" 

As we saw in the last section, the "Not-too-Soon" algorithm avoids "false positives" by 
delaying the addition of a new term to the fit model before the data were able to discern it. 
However, there is the logical possibility of the opposite danger: if the new term is added too 
late, then there could potentially be an erroneous n-state fit of data which is in fact better 
described by n + 1 states. If so, then the highest two states will be "averaged" into one. (So, 
for example, a fitted first excited state mass might take a value intermediate between the 
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true first and second excited state masses.) The addition of furtlier states to tlie fit model 
might then cover up this misidentification, and allow for this different kind of false positive. 

One would think that an increased x^/dof would warn of this danger; however, since the 
X^/dof is an average over many time slices, it's warning may come a time slice or two too 
late. If the data is such that a new term is discernible every couple of slices, then this may 
indeed lead to erroneous results. 

8 I ■ . ■ ■ . ■ ■ . 1 
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FIG. 11: Ratio of the contribution to the fit model from a proposed new term, tu„e~™"** , compared 
to the statistical error 5G{ti) at the newest time slice ti added to the data set. The new term is 
deemed not needed at time slices = 9 and = 8. At time slice ti = 7, the ratio exceeds 1 and 
the new term is conditionally accepted. The ratio remains above 1 for time slice ti = 6 and below, 
confirming that the new term is needed. The data is from a fit of the Roper resonance at the same 
quark mass as in Fig. 1141 

A final tweak of the algorithm addresses this issue: The "Just-in-Time" algorithm is the 
same as the "Not-too-Soon" algorithm outlined in the last section, except that the previous 
criterion for adding a new term to the fit model, namely that without this new term the 
X^/dof would suddenly increase, is replaced by a more sensitive criterion: As a new time 
slice ti is added to the data set, a tentative fit is made with a proposed extra term, {ti?„,m„}. 
Then the contribution of this fitted term is compared to the statistical error at the new time 
slice 

^^g-m„t, > ^Q^^,^ (24) 
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(See Fig. [TT]) If Eq. |211does not hold, then the new term is deemed not needed. If it does 
hold, then the new term is accepted (but only conditionally since the equation may hold 
only by statistical fluctuation). As new time slices are added, the new term must continue 
to pass this test. (This is stable: typically, if it passes the test once, it is likely to continue 
to pass the test at earlier time slices, since for these the relative error gets smaller and 
the fractional contribution gets larger.) If the test is failed before a higher-order term is 
conditionally added, then one returns to time slice and it is deemed that the nth term is 
not included in the fit. Then time — 1 is added to the data, a new proposal is made to 
add the nth term, and the process continues. 



IV. SOME PHYSICS RESULTS 



We have presented a list of algorithms with steadily increasing complexity but with 
steadily increasing robustness. The earliest and simplest "fixed At" algorithm, most easily 
described and presented first here as a template, was originally used (prematurely in retro- 
spect) for some preliminary conference presentations. We now deem it too unsophisticated 
and do not use it anymore. The plain vanilla "variable At" algorithm is adequate provided 
that the level spacings are well separated and each term is saturated by several time slices. 
It is perfectly adequate to describe, for instance, the pion ground and excited state, as ex- 
plained in Sect. IIVBI The plain "variable At" was used in the first version of our Roper 
paper and caused some erroneous results at high quark mass. This has been rectified by 
the use of the "Just-in-Time" algorithm for the updated version of our Roper paper and is 
briefly described in Sect. IIVCI 



The Simulation 



On a 16 X 28 lattice, we use the overlap fermion |13j and the Iwasaki gauge action 







with j3 = 2.264. The lattice spacing, determined from the measured pion decay constant /^r. 



is determined to be a = 0.200(3) fm, and thus the lattice has spatia! 
We adopt the following form for the massive Dirac operator ISL 

uiQa. 



size of 3.2 fm. 
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where 

D{p) = l + ^,e{H), (26) 

so that 

fl(™o) = P+^ + (p-^)7».(ff), (27) 

where e{H) = H/ \J H'^ is the matrix sign function and H is taken to be the hermitian 
Wilson-Dirac operator, i.e. H = '"j^D^,. Here Dyj is the usual Wilson fermion operator, 
except with a negative mass parameter — p = 1/2k — 4 in which Kc < n < 0.25. We take 
K = 0.19 in our calculation which corresponds to p = 1.368. The massive overlap action is 
defined so that the tree-level renormalization of mass and wavefunction is unity. 

we adopt .Ke Z„,.a.ev ..p,en,entation Q Q of .Ke op..a, .a.o.al app.™- 
tion 20|, |2l| to approximate the matrix sign function. The inversion of the quark matrix 
involves nested do loops in this approximation. Further details of the procedure are given 
elsewhere bllbj. 



B. Pion 
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FIG. 12: Two-point correlation function (^4^4) for the pion for three bare quark masses. 



Figure IT^ shows the correlation function for the correlator {A4A4). One can see that it 
is dominated by the ground state of the pseudoscalar channel (pion) over all but the few 
earliest time slices. This presents a problem for the default "fixed At" approach. Referring 
back to the algorithm of section III CI we see that at each step of the algorithm. At new time 
slices are added to the data while a new excited state is added to the fit model. Thus for 
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the pion correlator, one would be trying to fit to a model with many states when the data 
is saturated by just the ground state. Forcing a fit may give misleading estimates of excited 
state parameters which are subsequently used as priors. 

With the "variable At" refinement of the algorithm, rather than deciding a priori on 
the number of terms in the fit and adding time slices a fixed number at a time, one lets 
the data decide how many time slices to include with each enlargement of the data by 
choosing the minimum over a range of reasonable possibilities. Thus since the pion 
correlator is dominated by the ground state for many time slices, then many time slices will 
be automatically added before an attempt is made to fit the first-excited state. 

In fact, we find that the "variable At" method works very well for the pion correlators. 
Figure Uni shows the results of the fit, the ground- and first-excited-state weight and mass 
as a function of the bare quark mass. 
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FIG. 13: Ground and first-excited state pion mass m.,^a, as a function of the bare quark mass 
ma (left). Ground and first-excited stated pion weight, as a function of the bare quark mass ma 
(right). Notice that the ground state weight does not diverge as the quark mass approaches zero 
for this {A4A4) correlator as it would for the {PP) correlator, where P is the pseudoscalar density. 



C. The Roper Resonance 



Studies using standard curve fitting have heretofore failed to satisfactorily identify the 
"Roper resonance" N^^'^+{1U0) of the nucleon on the lattice [s^, 24, 3, Q, 22, 3, H S 
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l32| . Our analysis is the first lattice calculation to obtain the masses of the Roper and 
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5*11 (A^^/^~( 1535)) at low quark masses well in the chiral regime (with a pion mass as low as 
180 MeV). However, the effects of quenched artifacts, specifically the presence of ghost rj'N 
states, complicates the physics as well as the functional form of fit model, and so the details 
of the calculation are presented elsewhere To be sure, our calculation benefits from 
going to unprecedented low quark mass with the full chiral symmetry provided by overlap 
fermions, but the Sequential Empirical Bayes Method plays a crucial role. 

It also presented a challenge and led to the refinement of the SEB from the "variable 
At" version to the "Just-in-Time" version. The first version of our Roper results used 
the "variable At" method which, as we've seen, had until then been perfectly adequate to 
expose excited states (such as for the pion in Sec. IIVB|) where the density of excited states 
is not too high. Compared to our final analysis with the "Just-in-Time" version ^], the 
results for the nucleon and Su did not change. The only change is for the Roper state at 
medium and heavy quark masses. Indeed our first results at medium-heavy masses passed 
the "Stability Plot" test (Fig. IT^ and our focus returned to the more interesting light quark 
mass regime. Here the "variable At" criterion for when to add new states remains perfectly 
adequate for pion masses below ~ 350 MeV. 
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FIG. 14: Stability test for the Roper resonance at bare quark mass ma = 0.268 {m-,^ = 702(2) MeV). 
Priors had previously been selected by the plain "variable At" version of the Sequential Empirical 
Bayes Method. Now they are used in the standard constrained fit. The figure shows fit values for 
the lowest two masses (Nucleon and Roper) from constrained fits with different numbers of terms 
in the fit model. (At this mass, ghost states make no contribution, and so the fit model contains 
only exponentials.) 
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Subsequently, we came to realize that the "variable At" criterion that we adopted to 
introduce the excited state was not adequate for medium and heavy quarks, although it is 
adequate for the light quarks. In fitting someone else's proprietary charmonium data, we 
realized that the crucial difference between the heavy quark spectrum and that of the light 
is that the ratio of the excited state mass to that of the ground state is much smaller in the 
heavy system, that is, the excitation spectrum in the heavy quark system is more dense. The 
earlier criterion erroneously resulted in obtaining the average of the higher excited states as 
the first excited state for the heavy system. In view of this, we devised the new and final 
"Just-in- Time" criterion (Sec. IIIIB 6|1 . wherein a new term is added if its contribution is 
statistically significant at the current time slice. In fact, the penultimate "Not-too-Soon" 
algorithm fSec. 1111 B 6|) . where the criterion for adding a new term to the fit model depends 
on a sudden jump in the x^/dof , works just as well as "Just-in-Time" for the Roper resonance 
at both medium-high and low quark masses. But for some charmonium data, or other test 
data constructed with a dense excited-state spectrum, "Just-in-Time" is somewhat better 
than "Not-to-Soon" , and because of its potential theoretical advantages is our method of 
choice. 

Fig. ^1 shows that our final algorithm passes the "Stability Test" . However, so did 
the earlier "variable At" algorithm. This and the fact that the extracted excited state mass 
changed alarmingly means that the "Stability Test" llj is only a necessary but not a sufficient 
test for the reliability of constrained- curve fitting. We emphasize, however, that the source 
of the discrepancy has been identified and cured; the final algorithm can robustly protect 
against the averaging of excited states. 



D. Handling Ghost States 

In the quenched approximation, there are artifacts associated with the absence of quark 
loops. One of the more interesting consequences is that the would-be Vj propagator involves 
only double 77 poles in "hairpin diagrams" . These lead to the chiral-log terms contributing 
to hadron masses; we see these clearly in our recent lattice calculation of pion and nucleon 
masses j^. Another quenched artifact is the contribution of ghost states in the hadron 



propagators as first seen in the meson channel 



where the ghost S-wave r^'vr state 



lies lower in mass than the ao (for sufficiently small quark mass). We have seen a similar 
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FIG. 15: Same as for Fig. but for our best and final "Just-in-Time" algorithm. 



effect in analysis of the excited nucleon spectrum where a P-wave rj'N hes close in mass 
to the Roper resonance lf|, and its presence must be carefully disentangled by the fitting 
code. More dramatically, in the negative-parity channel Su (N^~), the lowest S'-wave rj'N 
=tate ha. a ,na. lower than that ofthe Sn for .nail qua* ma.. S,nce the „ couphng 
in the hairpin diagram is negative Idb], the Sn correlator changes sign with increasing time 
separation Qj. This effect is only seen at small enough quark masses, and thus is not seen 
in most lattice simulations at much higher masses. 

As a result, the form of the fitting function changes; there are extra non-exponential 
terms. Nevertheless, the SEB can successfully handle this situation. It is crucial to enforce 
the physical constraint that the weight of the ghost state be negative; otherwise there would 
be no way to distinguish it from the physical state (e.g. Roper) lying nearby. The details 
of the fitting model and the results of the fitting (using the "Just-in-Time" SEB algorithm) 
for the Roper and Su are in ^. 

Another example is displayed in Fig. where we show the ao propagator at two low 
quark masses, for which = 188(8) MeV (left) and = 213(7) MeV (right). The most 
dramatic feature of the data in each figure is the negative dip of the correlator at time slice 
3 and above. The solid (red) line passing through the data is the result of the SEB fit. The 
two dominant contributions to the fit model are negative and are indicated by the dashed 
(green) line labeled "term 1" and by the solid (cyan) curve labeled "term 2". They are 
modeled by the expression 

wi{l + E^t)e-"'v'-' (28) 
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FIG. 16: ao correlators for our lowest quark mass for which 171-,^ = 188(8) MeV (left) and = 
213(7) MeV (right). The negative dip of the correlators is an indication of the domination of the 
ghost S-wave r]'TT state over the physical oq- The curves are contributions to the fit model and are 
explained in the text. 

where Wi is constrained to be negative and the (l+ET^t) factor reflects the double-pole nature 
of the hairpin diagram [^. We fit m^/jr, the mass of the interacting would-be t]' and vr state. 
Since we work in a finite box, the tj'tc states are discrete and they are constrained to be near 



the energy of the two non-interacting particles, each with E = \jm? + I]i(~sin (^)) for 
discrete lattice momentum pi = ^ where n is an integer. For "term 1", = and for "term 
2" , n = 1. The contribution of the physical state ao is positive and is indicated by dotted 
magenta curve in each figure. In summary, our SEB algorithm is quite capable of handling 
non-standard forms of the fit model including negative-normed ghost-state contributions 
and can fit successfully the data. 



V. SUMMARY 



We have advocated refinements of Bayesian-inspired constrained-curve fitting which we 
believe better stabilizes fits at low quark mass. This permits analysis where the values 
of fit parameters are not reliably known a priori, such that their use as priors might be 
dangerously biased. 

In the "Sequential Empirical Bayes Method", we have constructed an automated and 
natural way of reliably obtaining the priors from naturally-nested subsets of the data ( "onion 
shells"). For the prototype described here, a time-dependent two-point correlation function 
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G{t), we first obtain estimates of the ground state mass and weight from unconstrained 
fits to a subset of data restricted to large times. These are then used as the priors in a 
subsequent constrained fit. A sequence of (fully or partially) constrained fits follow. At 
each step, more data from adjacent earlier times are added and another term is added to 
the fit model. When a new parameter is added, its first estimate is obtained from a fit 
in which it is unconstrained but previously introduced parameters are constrained. After 
each fit, the fitted values of each parameter are used as the priors for the next fit. Thus 
the value of each prior is free to fioat from one fit to the next, and thus is not prevented 
from changing significantly over the course of the several steps of the iteration as more and 
more data are added. This mitigates any bias which may have been introduced wherein the 
first estimates of a prior are inaccurate due to statistical fiuctuations in the smaller initial 
data set. In the final step, as the last data are added, no new terms are added to the fit 
model. A final fit for each parameter leaves it unconstrained (or very weakly constrained) 
with all other parameters constrained by priors set equal to the latest fitted values. Our final 
bootstrapped errors obtained from "releasing the constraint" in this way are larger than for 
a completely-constrained fit and are selected as a conservative estimate. 

We have found some further refinements to be fruitful: (a) We use a "scanning" technique 
to automate the selection of an initial value for a new parameter for its first unconstrained 
fit. We thus avoid the pitfall of the minimization routine becoming trapped in the 
attraction basin of a local, but not global minimum, (b) The weight factor (essentially a 
Lagrange multiplier). A, balances the infiuence of the data versus the priors in determining 
the output of a fit. Decreasing A from its canonical Bayesian value of 1, can be used to 
assess the systematic error associated with the choice of prior (s). We advocate absorbing 
this systematic error into a statistical error by promoting A to be "global dynamical weight", 
that is, a fit parameter with its own prior mean and standard deviation. 

We have tested that our algorithm can successful recover the correct fit parameters of 
an artificial data set, constructed as a sum of decaying exponentials with realistic values 
of the parameters. For one channel (corresponding to using only the local-local two-point 
correlation function), the masses and weights of the ground and first-excited states arc fit to 
within a measured standard deviation of the true values. Thus, for our real data, we restrict 
ourselves to measuring only the lowest two states, even though we use four or five states in 
the fit model. This is sufficient for our present purposes. 
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Our method is not in strict accord with the Bayesian philosophy, as we use subsets of the 
data to guide the selection of the priors. Nevertheless, the following mitigates bias from such 
data snooping: (a) The data is naturally nested in that a subset of data restricted to large 
times can provide a fair estimate of the lowest state parameters. Adiabatically increasing 
the data set with the introduction of new terms in the fit model gives fair estimates of 
excited-state parameters to be used as priors, (b) The priors are given ample opportunity 
to change in accordance with the data in the several steps of the algorithm. Is this enough? 
To test this we "partition the data" by configuration into equally-sized but disjoint sets and 
apply our algorithm to one half to obtain priors and then use these priors in a standard 
constrained fit on the other half (in strict accord with the Bayesian philosophy) and on the 
full data set. We see no difference beyond expected statistical errors; that is, no bias is seen. 

In further studies using artificially constructed data with known means and errors ("toy 
models") we have presented some caveats against the careless application of the algorithm 
which might result in a state being spuriously identified as a combination of two, leading to 
a "false positive" prediction. These false positives were avoided in the toy model studies by 
insisting that a new term be added to the fit model "Just-in-Time" , that is only after a fit 
with fewer terms is deemed inadequate. 

For this pilot study, we have analyzed the ground and first-excited masses and weights for 
the pion from overlap fermions on a quenched 16^ x 28 lattice with spatial size La = 3.2 fm 
and pion mass as low as 180 MeV, commented on the history of the use and suitability of 
variations of the SEB algorithm for extracting the Roper resonance Q], and given another 
example (ao) where the method can handle ghost states. 
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